home *** CD-ROM | disk | FTP | other *** search
- Non-Recursive String Table for LZW Decompressors - John Jeppson
-
- The standard string table format, as suggested in "A Technique for High-
- Performance Data Compression" by Terry A. Welch, is storage elements of the
- form [prefix]K where [prefix] is a reference to an earlier string in the
- table. Character output then consists of looking up a string table entry and
- generating the output string by recursive decomposition. The characters come
- off one at a time in reverse order, so are generally pushed on the stack and
- copied off as a string, thereby correcting the order.
-
- The [prefix]K format is particularly well suited to compressors since it
- facilitates hashing methods for searching the table. Decompressors, on the
- other hand, never search the table so during decompression this format
- may not yield optimum performance. This file discusses an alternative
- strategy for decompressors.
-
- The trade-off is memory for speed. Computationally intensive recursive
- decomposition is not required, but the entire output charstream must be
- retained in memory as an array of bytes. The string table is implemented as
- a table of pointers (and counts) to "earlier" strings already present in the
- output stream. Output for each received code is then a simple matter of
- copying the string from an earlier site to the present site.
-
- If 'N' is the number of bits/pixel then the first 2**N strings in the string
- table consist of a single character. These single-character strings are
- the "root" codes and when first encountered do NOT previously exist in the
- output stream. All subsequent strings are at least two characters in length
- and do exist in the output charstream.
-
- The string table should consist, therefore, of a set of 4096 records each
- consisting of a pointer and an integer count. (The longest possible string
- is less than 4096 characters, so a 16-bit count will suffice.) The first 2**N
- string table entries should be pointers to a separate static string array,
- and each count should be 1. This static string array can just be a
- pre-initialized array [00,01,02...FE,FF] of byte. Whenever a clear code is
- encountered the string table should be cleared and re-initialized to these
- root-string pointers.
-
- During decompression every code value received requires (1) a write of one or
- more characters to the output stream, and (2) a new entry in the string table
- (except for the first code). The destination address and count of the
- current write and of the previous write are retained and updated as
- decompression proceeds.
-
- When a code is received there are two cases:
-
- 1. value(code) already in string table.
- (a) copy value(code) to the output charstream at curAddress.
- (b) install previousAddress/previousCount+1 as the next entry in the
- string table. Note that by incrementing the count, the new string entry will
- "overlap" onto the first character of the curAddress, which is just what we
- want (i.e. [prefix]K ).
-
- 2. value(code) not yet in string table.
- In this case we copy [prefix]K to the output charstream at curAddress.
- Here [prefix] is the string at previousAddress/previousCount (from the
- previous write operation) and K is the first character of that same string
- (i.e., previousAddress[1]). We then add curAddress/curCount as the next
- entry in the string table.
-
- In every case we are simply copying a string of characters which should be
- much faster than recursive decomposition. It won't work, however, if the
- output charstream is not retained in memory. It also won't work, or at least
- not gracefully, if the output charstream is maintained as small bitfields or
- is broken up by intervening fill. This might be true, for example, if the
- output stream is sent directly to screen memory. In those circumstances
- recursive decomposition is the way to go.
-
- With regard to the "root" string pointers, beware of relocation problems.
- These absolute addresses should be loaded dynamically whenever the string
- table is reset, they should never be hard coded into the application.
-